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We consider higher order frequentist inference for the parametric 
component of a semiparametric model based on samphng from the 
posterior profile distribution. The first order validity of this procedure 
established by Lee, Kosorok and Fine in [J. American Statist. Assoc. 
100 (2005) 960-969] is extended to second-order validity in the set- 
ting where the infinite-dimensional nuisance parameter achieves the 
parametric rate. Specifically, we obtain higher order estimates of the 
maximum profile likelihood estimator and of the efficient Fisher in- 
formation. Moreover, we prove that an exact frequentist confidence 
interval for the parametric component at level a can be estimated 
by the a-level credible set from the profile sampler with an error of 
order Op{n~^). Simulation studies are used to assess second-order 
asymptotic validity of the profile sampler. As far as we are aware, 
these are the first higher order accuracy results for semiparametric 
frequentist inference. 

1. Introduction. The focus of this paper is on higher order frequentist 
inference for the parametric component of a semiparametric model. In ad- 
dition to the d-dimensional Euclidean parameter 9, semiparametric models 
also have an infinite-dimensional parameter rj, sometimes called the "nui- 
sance" parameter. A classic example is the Cox proportional hazards model 
for right-censored survival data [4], where interest is focused on the log haz- 
ard ratios 6 for the regression covariate vector z. The integrated baseline 
hazard function rj is the infinite-dimensional nuisance parameter. The in- 
volvement of an infinite-dimensional nuisance parameter in semiparametric 
models generally complicates maximum likelihood inference for 9. In partic- 
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ular, estimating the limiting variance of y/n{9n — Oq), where is the true 
value of 9, usually requires estimating an infinite-dimensional operator. 

The related studies concerning higher order frequentist inference in the 
parametric models under the Bayesian set-up focus on the choice of priors, 
such as objective priors [30]. However, it turns out that extending the objec- 
tive prior approach to the semiparametric setting seems to require a higher- 
than-second order expansion of the profile likelihood and appears to be quite 
difficult. A similar hurdle appears to be required for extending the higher 
order bootstrap results for parametric models [8] to the semiparametric set- 
ting. Interestingly, general first-order bootstrap results for semiparametric 
M-estimators have only recently been developed (see [31] and [18]). Higher 
order extensions for any of these approaches would be very useful. However, 
in this paper, we will pursue an apparently simpler approach to obtaining 
higher order likelihood inference for semiparametric models based on the 
profile sampler proposed in [17]. 

The profile sampler provides a first-order correct approximation of the 
maximum likelihood estimator 0„ and consistent estimation of the efficient 
Fisher information for 6 based on sampling from the posterior of the profile 
likelihood, even when the nuisance parameter is not estimable at the ^/n 
rate. The validity of the profile sampler relies on special properties of the 
profile likelihood in semiparametric models, some of which are extensively 
studied in [20, 21] and [22]. The profile likelihood for the parameter 9 is 
plni9) = sup^lik„(0, ?/), where \\\s.ni9,if) is the full likelihood given n obser- 
vations. We also define r/g = argmax^ lik„(0, t/). The maximum likelihood 
estimator for the full likelihood is thus {9n,fin), where fjn = f]^ . Considera- 
tion of the profile likelihood in frequentist inference about 9 can be traced 
back to the ordinary parametric model. An intuitive interpretation for the 
validity of the profile likelihood in semiparametric models is that it can be 
viewed as an estimator of the least favorable submodel for the estimation of 
9 [25] . The least favorable submodel, which will be briefiy introduced in the 
next section, is the closest parametric model to the semiparametric model 
in the sense of information. In practice, the profile likelihood can often be 
easily computed using procedures such as the stationary point algorithm (as 
used in, e.g., [14]) or the iterative convex minorant algorithm introduced in 
[7] to find fjQ if ?7 is a monotone function. 

An advantage of the profile sampler is that a prior on the infinite-dimensional 
parameter is not required to obtain valid frequentist inference about 9. As- 
signing a prior on rj can be quite challenging since for some models, there 
is no direct extension of the concept of a Lebesgue dominating measure for 
the infinite-dimensional parameter set involved [15]. The fully Bayesian ap- 
proach can obviously be the basis for inference on 9 alone via the marginal 
posterior. The first-order valid results in [26] indicate that the marginal 
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semiparametric posterior is asymptotically normal and centered at the corre- 
sponding maximum likelihood estimator or posterior mean, with covariance 
matrix equal to the inverse of the efficient Fisher information. Unfortunately, 
this marginal approach does not circumvent the need to specify a prior on 
rj, with all of the difficulties that entails. 

The main contribution of this paper is the development of higher order fre- 
quentist inference for the parametric component of a semiparametric model 
through the profile sampler procedure proposed in [17] by assuming stronger 
assumptions on the semiparametric model and prior. We assume that the 
nuisance parameters of the semiparametric models studied in this paper have 
the parametric rate. This assumption permits the treatment of the likelihood 
as essentially parametric in certain aspects. This enables the second-order 
frequentist inference results for parametric models to be naturally extended 
to the semiparametric setting, although we note that considerable technical 
difficulties are present despite this simplification. To accomplish the above 
higher order inference, we require stricter — but still reasonable — regularity 
conditions than those imposed by [22] on the least favorable submodel. This 
is reviewed in Section 2. The initial technical step, presented in Section 3, is 
to establish higher order versions of expansions (5)-(6) in [22]. In Section 4, 
we find that the mean (median) value and the inverse of the variance of the 
MCMC chain from the profile sampler are actually higher order estimates 
of the maximum likelihood estimator and the efficient Fisher information, 
respectively. The main result of Section 4 is to prove that an exact frequen- 
tist confidence interval for 6q can be estimated by the credible set from the 
profile sampler with an error of order only Op{n~^). In Section 5, we dis- 
cuss three examples and some simulation results. Section 5 is followed by a 
discussion in Section 6 of future research interests. We postpone most of the 
technical details to the proofs given in Section 7. 

As far as we are aware, these are the first higher order accuracy results for 
semiparametric frequentist inference. This is quite distinct from the concept 
of second-order efficiency in semiparametric models (see [9] and [5] ) which we 
do not consider in this paper. The two important tools we use in this paper 
are separately empirical processes and sandwich techniques [22] , with which 
we establish upper and lower bounds for the error in the profile log-likelihood 
expansion. For ease of exposition, we assume throughout the paper that 9 £ 
M^. However, the results can be readily extended to higher dimensions. The 
confidence "interval" and credible set for d-dimensional 9 are a rectangle, a 
cuboid and a hypercuboid when d = 2, d = 3 and (i > 4, respectively. 

2. Preliminaries. We assume the data Xi, . . . ,Xn are i.i.d. throughout 
the paper. The sample space X will depend on the semiparametric model 
which is defined by a density {po^rj{x) -.9 £ @,r] £ 7i}, where 7i is an arbi- 
trary subset that will typically be infinite-dimensional. We first review the 
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concept of a least favorable submodel and then present some notation and 
assumptions that will be used throughout the paper. 



2.1. The least favorable submodel. The score function for 6, £g^^ is de- 
fined as the partial derivative with respect to 9 of the log-likelihood given 
fixed rj. A score function for ijq is of the form 



d_ 
dt 



t=o 



where ft, is a "direction" by which rjt £TC approaches rjo, running through 
some index set H. Aq^^]'- H ^ L^{Pe^ri) is the score operator for r/. The effi- 
cient score function for 6 is defined as (.q^^ = ^e,ri — ^e,ri^e,ri, where Ile^ri^g^ri 
minimizes the squared distance Pe,r){(-e,ri — k)'^ over all functions k in the 
closed linear space of the score functions for rj (the "nuisance scores"). The 
variance of ^^.r^i called the efficient information matrix, Iq^^, is the Cramer- 
Rao bound for estimating in the presence of the infinite-dimensional nui- 
sance parameter rj. We denote (.eo,m -^do.m and Iq, respectively. 

A submodel t^pt^rit is defined to be least favorable at (^,r/) if le^ri = 
d/dt\ogpt^rjt> given t = 6. The "direction" along which rjt approaches rj in 
the least favorable submodel is called the least favorable direction. Gener- 
ally, the least favorable direction at {9, rf) in semiparametric models can be 
obtained by solving for hg ^^ in the equation P{le,r} — ■^e.rihe,ri)Ae,rihe,r) = 
by the projection principle and is usually in the form of a conditional ex- 
pectation. Section 2 in [22] provides an excellent guideline for searching for 
a least favorable submodel. Since the projection Ilg^^e ,j on the closed lin- 
ear span of the nuisance scores is not necessarily a nuisance score itself, the 
least favorable submodel may not always exist. However, we assume that in 
our setting a least favorable submodel always exists or can be approximated 
sufficiently closely by an approximately least favorable submodel. An insight- 
ful review of least favorable submodels and efficient score functions can be 
found in Chapter 3 of [13]. Systematic coverage of semiparametric efficiency 
theory can be found in [1] and [2]. 

The least favorable submodel in this paper will be constructed in the 
following manner. We consider a general map from the neighborhood of 
9 into the parameter set for r/, denoted hy t ^ r]t{9,r]). Then, the map 
t i{t,9,r]){x) can be defined as follows: 

(1) e{t,9,r]){x) = loglik{t,r,t{9,r])){x). 



The details of this map will depend on the situation. 
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2.2. Notation and assumptions. The dependence on x £ X of the Ukeh- 
hood and score quantities will be largely suppressed for clarity in this section 
and hereafter. The £{t,9,rj), £{t,6,r]) and £^^\t,9,r]) are separately the first, 
second and third derivatives of i{t,6,r]) with respect to t. For brevity, we 

write 4 = ^(^0,^0,%), io = £{Oo,eo,m) and i'i^ = i'^^\eo,0o,m), where 
and r/o are the true values of 9 and i], respectively. Based on the definition 
of the least favorable submodel, £o is just £o defined above. £g{t,6,T]) indi- 
cates the first derivative of £{t,9,r]) w.r.t. 9. Similarly, £t g{t,9,r]) denotes 
the derivative of £{t,9,r]) w.r.t. 9. Also, £t,t{9) and £t,e{il) indicate the maps 

~l/2 

9 I— > £{t, 9, rf) and rj ^ £t,e{t, 9, rf), respectively. Let Qn denote {9 — 9n)lQ and 
let (/)(•) ($(•)) represent the density (cumulative distribution) of a standard 
normal random variable. > and < mean greater than, or smaller than, up 
to a universal constant. Define x V y {x Ay) to be the maximum (minimum) 
value of X and y. 

Pn and G„ are used to denote the empirical distribution and the empirical 
process of the observations, respectively. Furthermore, we use the operator 
notation for evaluating expectation. Thus, for every measurable function / 
and true probability P, 

in „ 1 " 

IPn/ = -E/(^')' Pf= fdP and (G„/ = ^(/(Xi)-P/). 

We now make the following assumptions: 

1. ^0 £ C M^, where is a compact set and 9q is an interior point of 0; 

2. rjQ{9,r]) = rj for any {9,7]) e @ X TC; 

3- WVfj^ ~ VoW = Op{n~^/'^ + \9n — 9o\) when 9n = 9o + op{l) for some norm 



4. the maps 



have integrable envelope functions in Li{P) in some neighborhood of (^Oj 
00,%) for (/,m) = (0,0), (1,0), (2,0), (3,0), (1,1), (1,2), (2,1); 
5. there exists some neighborhood V of {9o,9o,7]q) in Q x @ x Ti such that 
the classes of functions {£{t, 9, t]){x) : {t, 9, rj) € V} and {£tfi{t, 9, r]){x) : {t, 
9,rj)GV} are P-Donsker and {^(^) (t,9,r]){x) : {t,9,r]) £ V} is P-Glivenko- 
Cantelli; 

6. 

(2) Gn{£{9o,9o,v)-^o) = Op{\\r,-rio\\), 

(3) Pm,Oo,ri) - P£{9o,9o,r,o) = 0(||r/ - r?o||), 

(4) P£tA^o,9o,r])-P£t,e{^o,9o,vo) = 0{\\7]-r]o\\), 

(5) P£{9o,9o,v) = 0{\\r,-rjof) 
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for all r] in some neighborhood of 770! 
7. Iq is strictly positive. 

Assumption 2 ensures that the least favorable submodel passes through 
{0,r]), that is, i{9,9,r]){x) = loglik(^, 77)(x). Assumption 3 implicitly assumes 
that we have a metric or topology defined on the set of possible values 
of the nuisance parameter r]. In this paper, uniform and weak topology 
norms are applied to the nuisance parameter in different examples. Defini- 
tions of the uniform and weak topology norms will be given in Section 5. 
Furthermore, the parametric convergence rate of the nuisance parameter is 
needed to obtain our second-order results. Assumption 4 can be viewed as 
comprising regular smoothness conditions on the Euclidean parameters of 
the least favorable submodel. Assumption 4 implies that i{t,6,r]j is smooth 
enough in its Euclidean parameter arguments so that —Pi^ = Pio- Assump- 
tion 4 also implies that {d/dd)P£{9Q,9,rio) =0 at 6 = 9o. Fixing rj and dif- 
ferentiating Pg^r}^{6, 6, if) relative to 9 gives Pe^r]^g^r]^{9, 9, rf) + Pg^^£{9, 9, 77) + 
{d/{dt))\t=ePe^r,^{9,t,r]) = since Pg^r,i{0,9,r]) = for every {9,7]) and we 
can choose {9,r]) = {9o,r]o). 

The assumptions also impose some regular smoothness conditions on 
l{t,9,r]) relative to rj in the function space. Condition (2) involves the conti- 
nuity modulus of the empirical processes. It can be easily satisfied if we can 
show that i{9o, 9o, rj) — io divided by ||r/ — r/o|| belongs to a P-Donsker class. 
The verification methods for (3)-(5) vary for different situations. Assuming 
a uniform norm is applied, (3) and (4) are usually satisfied if it.eiv) 
it,t{'n) have bounded Frechet derivatives. 

To verify (5), we need to briefly introduce Taylor series in Banach spaces 
[32]. Let C be a map from C D ^ E, where D and E are both Ba- 
nach spaces. If we assume C(') is second-order Frechet differentiable, then 
the Taylor expansion of Cii!^ + h) around ({i}) can be written as + 
h) = C(i?) + + Q+rhih, h)/2, where r e [0, 1]. C^(/i) is just the regular 
Frechet derivative of ('(•) at the point along the direction h and ('^{h,g) 
is the second-order Frechet derivative from to E. We can then write 

pi{9o,9o,v) = Pf-^^^{m,9o,v)-m,0o,Vo))]-P[^^^^^ 
AqIt] — rjo))], where Aq = ^eo,f?o Ag^^] is the score operator for r] at {9, rf), 
for example, the Frechet derivative of logpg ^i relative to ry. The above equa- 
tion holds since PioAoh = for every h, by the orthogonality property of 
the efficient score function. Note that the boundedness property of Cgi'i') 
means that ||C"e(^5 5)||E < II^IId^ llslb^ ■ Thus, under the given regularity con- 
ditions, Frechet differentiability rj l{9o,9Q,r]) plus second-order Frechet 
differentiability of r] lik{9o,rj) implies (5) based on the above discussions 
if the uniform norm is being applied to rj. 

In principle, these smoothness conditions on the least favorable submodel 
make the profile likelihood pln{9) behave asymptotically like a parametric 
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likelihood. The imposed assumptions are stronger than assumptions (3.1)- 
(3.4) in [22], enabling us to achieve higher order asymptotic expansions for 
the log-profile likelihood. 

3. Second-order asymptotic inference. In this section, we present second- 
order asymptotic expansions of the log-profile likelihood which prepare us 
for deriving the main results of Section 4 on the higher order structure of the 
posterior profile distribution. Some of the results of this section are useful 
in their own right for inference about 9. The assumptions of Section 2 are 
assumed throughout. We need the following lemma on the behavior of a 
random sequence of approximations of 

Lemma 1. If On satisfies {On — On) = op{\), then 

(6) Pj(0o,^n,%J = PJo(^i) + Op (n- ^\Q^_Q^\f^ 

(7) ^n'm , On , %^ ) = Pio + Op {n- + \en-On\). 

Remark 1. Conditions (6) and (7) can essentially be viewed as the 
empirical versions of the no-bias conditions for the least favorable submodel 
(see, i.e., Chapter 25 of [28]). We can easily verify (6) and (7) if every 
argument of i(t, 0, ij) and i{t, 0, i]) is smooth enough and the above empirical 
process assumptions are satisfied. 

The following theorem gives key higher order expansions of the log-profile 
likelihood around On and on the error term in the asymptotic linearity ex- 
pansion of On- 

Theorem 1. If On satisfies {On — On) = op{l), then 

logplniOn) = logplniOn) - -{On - On)'^h 

(8) 

+ Op{n\0n-On? + n-^l^), 
1 " 

(9) V^{On - Oo) = ^Y.~^o{Xi)I^^ + Op(n-i/2). 

V™ i=l 

Remark 2. Expansions (8) and (9) are essentially second-order versions 
of (6) and (5), respectively, in [22], which have the respective error terms 
op{\/n\On — Oq\ + 1)2 and op(l). The parametric counterparts to (9) can be 
found in [16]. 

Remark 3. Expansion (8) can be used to construct an estimator of the 
standard error of On , which is called the "discretized" version of the observed 



8 



G. CHENG AND M. R. KOSOROK 



profile information, /„, in [21]. Specifically, the discretized version of the 
observed profile information is expressed as a discretized second derivative 
of the profile likelihood in 9n as follows: 



ns^n 



Expansion (8) implies that 

(11) /n = /o + Op(|Sn|+n-3/2|s„|-2). 

Obviously, the theoretically optimal step size of /„ is s„ = Op(n~^/2) and 
= Op{'n}/'^) in terms of the order of error term. In that case. In is a 
-y/n-consistent estimator of Iq. 



An advantage of the method given in Remark 3 is that we can estimate 
Iq even without an explicit form for the efficient Fisher information matrix 
or efficient score function. We only need the form of the profile likelihood, 
which is the minimal requirement, to carry out this numerical differentiation. 
Formula (11) provides us insight into the relationship between the step size 
of numerical differentiation and the convergence rate of /„. In other words, 
we can set a specific step size in advance to achieve the desired convergence 
rate. This is an improvement on Corollary 3 given in [21] which can only 
prove the consistency of the observed profile information. 



4. Main results. We now present the main results on the posterior pro- 
file distribution. Let Pq]^^ posterior profile distribution of 9 w.r.t. the 

prior p{9) given data X = (Xi,...,X„). Define An{9) = n"^{logpln{9) — 
logplni9n)} ■ A preliminary result, Theorem 2 with Corollaries 1 and 2 be- 
low, shows that the normal approximation to the posterior is second-order 
accurate for the cumulative distribution, the density and for the moments. 
The main result, Theorem 3, shows that the posterior profile distribution 
can be used to achieve second-order accurate frequentist inference. 



Theorem 2. Assume the above assumptions and that 

(12) An{9n)=op{l) implies 9n = 9o + op{l). 

If proper prior p{9o) > and p{-) has a continuous and finite first-order 
derivative in some neighborhood of 9q, then we have, for — cxd < ^ < cxo, 
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We note that the general theory concerning asymptotic expansions of 
posterior distributions in parametric models can be fomid in [11]. We also 
note that Theorem 1 in [17] implies the following: 

(14) Pe\xiV^iO - 4)/o^' < = m + op{l). 

Clearly, (14) is a first-order version of (13). A possibly more practical version 
of (13) is 

(15) Pe\xiV^io - en)il'^ < = m + Op(n"^/'), 

where In can be estimated using (11) with an appropriate step size, for ex- 
ample, Sn = Op{n~^/'^) and = Op{v}l'^). Thus, we can construct the one- 
sided/two-sided credible set for 6 with probability coverage a + Op{n~^/'^) 
in the following. Denoting by Za to be the standard normal ath quantile, 
we have 

(16) P,^^ {e<k + -|=) =a + Op{n-^/''), 

(17) (On - ^ <e<en + ^) =l-a + Op(n-V2) 
for a G (0, 1), where I = /q or /„. 

Corollary 1. Under the assumptions of Theorem 2, let /«(•) be the 
posterior profile density of ^JnQn relative to the prior p{9). We then have 

(18) fniO = m+Opin-'/^). 

Remark 4. The parametric analog of (18) is (2.2) in [6], which is a 
higher order expansion of the multivariate posterior density of the vector 
\/n{d — 9n) in a parametric model. Note that the parametric version in- 
volves the full likelihood rather than the profile likelihood and thus a prior 
is assigned to each element of the multivariate 6. However, the posterior dis- 
tributions relative to the full likelihood and the profile likelihood coincide 
for certain special priors which will be discussed in Remark 7 below. 

Corollary 2. Under the assumptions of Theorem 2 and recalling that 
Qn = {G — ^n)iy^ , we have that if \9\^ p{6) d9 < oo, then 

(19) 4,^^; = n~^/'EU^ + Op(n-('^+i)/2), 

where Eg.-^g'^ is the rth posterior moment of Qn and U ~ iV(0, 1). 
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Remark 5. Note that the rth posterior moment of Qn in the above is 
based on the posterior profile distribution. By Corollary 2, we thus have 

(20) = 4,^(0) +Op(n-i), 

(21) /o = — + 0p(n-V2), 

nVarg|^(0) 



where 



V;r,|^(e) = 4|^(^-4|^(0))l 



From (20), we know the maximum likelihood estimator of 6 can be estimated 
by the mean of the profile sampler with an error of order Op(n^^). Moreover, 
from the proof in Section 7 of Theorem 3 below, we can verify that On is 
also estimated by the median of the profile sampler to the same order of 
accuracy. Similarly, the efficient information can be estimated by the inverse 
of the variance of the profile sampler with an error of order Op(n^^/^). This 
is a better method to estimate Iq than (11) since it is automatically ^Jn- 
consistent. Note that the first-order versions of (20) and (21) can be derived 
from Theorem 1 of [17]. 

Combining (9) and (20), we know that the mean value of the profile 
sampler can be shown to be a semiparametric efficient estimator of 6. This 
conclusion also holds for the median value of the profile sampler. In this 
paper, we have provided an alternative efficient estimator to the maximum 
likelihood estimator . 

We now present the main theorem of this paper. The ath quantile of the 
posterior proffie distribution, Tna-, is defined as Tna = inf{^ :Pq^^{6 < ^) > 

a}. Without loss of generality, Pg^xiO < Tna) = «• We can also define Kna = 
\/n{Tna — On), that is, PQ^j^{^/n{6 — On) < Kna) = The following theorem 
ensures that there exists a kna based on the data such that P{^/n{9n — Oq) < 
fina) = a and \kna - Hna\ = Op{n~^/'^). 

Theorem 3. Under the assumptions of Theorem 2 and assuming that 
(■q{X) has finite third moment with a nondegenerate distribution, there exists 

a kna based on the data such that P{y/n{9n — Oq) < kna) = ol and kna — i^na = 
Op(n-i/2). 

Remark 6. Note that the nondegenerate distribution assumption of 
l!.{X) can be easily satisfied if X has a nonsingular absolutely continuous 
component. Theorem 3 implies that the one- (two-) sided confidence interval 
for 9 can be estimated by the one- (two-) sided credible set of the same level 
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from the profile sampler with an error of the order Op{n~^). We conjecture 
that y/n times the Op(re~^/^) term in Theorem 3 converges to the product 
of two different nontrivial but uniformly integrable Gaussian processes. 

Remark 7. We can essentially generate the profile sampler from the 
marginal posterior of 9 with respect to a certain joint prior on = {0,1]) 
which is possibly data-dependent [17]. For example, in the Cox model with 
right-censored data, a gamma process prior on rj [12] with jumps at observed 
event times, but not involving 9, can be such a prior. 

5. Examples. We now illustrate the verification of the assumptions of 
Section 2 with three examples. The detailed technical illustrations and model 
assumptions for the three examples can be found in [19, 20, 21, 22, 23]. We 
also present simulation studies to assess the properties of the profile sampler 
for the first example. 

5.1. The Cox model with right- censored data. 

5.1.1. Theory. The Cox model is 

(22) X{t\z) = hm^ <T<t + A\T >t,Z = z) = \{t) exp{ez), 

where A is an unspecified baseline hazard function and is a vector including 
the regression parameters [4]. For the Cox model applied to right-censored 
failure time data, we observe that X = {Y,6,Z), where Y = T A C, 6 = 
I{T < C} and Z € Z C is a regression covariate. T is a failure time with 
integrated hazard e^^A{t) given the covariate Z, where A(y) = Jq X{t)dt is 
a cadlag, monotone increasing cumulative hazard function with A(0) = 0. C 
is a censoring time independent of T given Z. We define a likelihood for the 
parameter (^, A) by replacing \{y) with the point mass A{y}: 

(23) lik(^, A) = (e^"A{y}e-^*'^(^))^(e-'^''^(^))^-^ 

9 is assumed to come from some compact set Q and the true regression 
coefficient, Oq, belongs to the interior of Q. The parameter space for A, Ti., 
is restricted to a set of nondecreasing, cadlag functions on the interval [0,r], 
with A(r) < M for a given constant M. 

We now discuss the form of the profile likelihood. Suppose there are I 
observed failures at times T(i) < ••• < T(;), where (i) is the label for the 
ith ordered failure and ti is the observed value of T(j). z^j^ is the covari- 
ate corresponding to the observed event time t^. The log-profile likelihood 
(equivalently, the log-partial likelihood) for 9 is given by 

(24) logpln{9) = f ^^[i] - log ^ e'A , 

1=1 \ j&R, J 



12 



G. CHENG AND M. R. KOSOROK 



where Ri = {j -Yj > ti} is the risk set. In this case, the profiled nuisance 
parameter is not present in pln{0). Nevertheless, it is not hard to verify that 

{y,<t} ^ieR.exp(yzjj 

Note that Aq is a nondecreasing step function with support points at the 
observed event times and, based on [10], \\An — AqUoo = Op{n^^^^ + |^n — 
^ol). 

The score function for 9 can be easily derived as 

ie,A{x)=6z-ze''A{y). 

Given a fixed A and a bounded function /i : IR-*^ , we can define a 

path At by dAt{y) = (1 + th{y)) dA{y) . Thus, the score function for A in 
the direction h via an operator Aq\:L2{A) i— > L2{Pe^K) is Ae,Kh{y,5,z) = 
8h{y) — e^^ y^hdA. Following the regular conditions and discussions on 
page 16 of [22], the least favorable direction hQ^\ at {6, A) can be constructed 
as 

Ee,Ae'^Zl{Y > y} 

^'''^y^= W^i{y>,} • 

Substituting 6 = t and A = At{9, A) [where dAt{9, A) = {1 + {9 - t)ho) dA 
and ho(-) is an abbreviation for /i6»o,Ao(")] the above Cox likelihood and 
differentiating with respect to t, we obtain 

^(M,A)(.)=V.(.a)-Aa.(.a)( ,^;1%^(^) )(^), 



i{t,9,A){x) 



Kiy) 



{l + {9-t)ho{y)) 



2 



z^e^'At{9,A){y) + 2ze^' T/jodA, 

Jo 



lt,e{t,9,A){x) = -ze'' r hodA + S— 
Jo (,i 



ho{y)' 



+ i9-t)hoiyW' 



zh^'At{9,A){y) + 3z'^e^^ T/iodA. 

Jo 



We know the maps {t,9,A)^6^\t,9,A), for fc = 1,2,3, are continuous and 
uniformly bounded around (^Oi^OiAq), relative to the uniform topology on 
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A, by the inequality hod{A - Aq) < ||A - Ao||oo||^o||bk, where ||/io||b7 
is the total variation of /jo(") in [0,r]. The total variation of a function 
f :[a,b]^ R, \\f\\BV, is + I{ab]\^fi^)\- Considering the fact that the 

class of uniformly bounded functions with bounded variation over compacta 
is P-Donsker, we can check the empirical process assumptions by repeatedly 
using the P-Donsker preservation results. The following lemmas verify the 
remaining conditions, thus the results of Sections 3 and 4 hold. 

Lemma 2. Under the above set-up for the proportional hazards Cox 
model, assumption 6 is satisfied. 

Lemma 3. Under the above set-up for the proportional hazards Cox 
model, condition (12) is satisfied. 

5.1.2. Simulation study. To verify that the profile sampler can generate 
second-order frequentist valid inference, we conducted simulations for Cox 
regression with right-censored data for various sample sizes under a Lebesgue 
prior. For each sample size, 500 data sets were analyzed. The event times 
were generated from (22) with one covariate Z ~ U[0, 1]. The regression coef- 
ficient is 9 = 1 and A{t) = exp(i) — 1. The censoring time C ~ C/[0,tn], where 
tn was chosen such that the average effective sample size over 500 samples 
is approximately 0.9n. For each dataset, Markov chains of length 5,000 with 
a burn- in period of 1,000 were generated using the Metropolis algorithm. 
The jumping density for the coefficient was normal with current iteration 
and variance tuned to yield an acceptance rate of 20-40%. The approximate 
variance of the estimator of 9 was computed by numerical differentiation 
with step size proportional to n~^/^, according to Remark 3. 

Table 1 summarizes the results from the simulations giving the average 
across 500 samples of the maximum likelihood estimate (MLE) , mean of the 
profile sampler (CM), mean squared difference between two estimates of 9 
(MSDe), estimated standard errors based on MCMC (SEm), estimated stan- 
dard errors based on numerical derivatives (SEn), mean squared difference 
between the two estimated standard errors (MSDy) and empirical coverage 
of nominal 0.95 confidence intervals based on MCMC (CP95). The Monte 
Carlo standard error of CP95 is f»0.01 = ^0.05 x 0.95/500. Table 2 summa- 
rizes the difference of boundaries for the two-sided 95% confidence interval 
for 9 generated by numerical differentiation, that is, (17), and MCMC, re- 
spectively. LBm (LBn) and UBm (UBn) denote the lower and upper bound, 
respectively, of the confidence interval from the MCMC chain (numerical 
derivative). 

In all cases, the bias in Table 1 is small. Similar simulations are also 
performed in the Cox model with current status data in [17], that is, Table 1, 
which has larger bias. The contrast of two simulations reveals an interesting 
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Table 1 

Simulation results for Cox regression with right-censored data based on 500 samples [the 
true value of the regression coefficient is 9 = 1) 



n 


MLE 


CM 


VMSDe 


SEm 


SEn 


VMSDv 


CP95 


20 


1.1049 


1.1376 


0.0978 


4.4128 


4.3004 


0.2934 


0.9496 


50 


1.0202 


1.0262 


0.0275 


3.9869 


3.9548 


0.1378 


0.9496 


100 


1.0156 


1.0181 


0.0195 


3.8592 


3.8561 


0.1568 


0.9506 


200 


1.0131 


1.0147 


0.0114 


3.8124 


3.8105 


0.1220 


0.9500 


500 


1.0012 


1.0016 


0.0069 


3.7598 


3.7691 


0.1206 


0.9502 



n, sample size; MLE, maximum likelihood estimator; CM, empirical mean; MSDe, mean 
squared difference between two estimates of 9; SEm, estimated standard errors based on 
MCMC; SEn, estimated standard errors based on numerical derivatives; MSDv, mean 
squared difference between the two estimated standard errors; CP95, empirical coverage 
of nominal 0.95 confidence intervals based on MCMC. 



phenomenon: the profile sampler based on the semiparametric models with 
faster convergence rate is more accurate. Note that the terms n|MLE — 
CM|, -^/n|SEM — SEn|, n|LBM — LBn| and n|UBM — UBn| are bounded in 
probability according to Corollary 1 and Theorem 3, that is, (11), (20) and 
(21). The realizations of these terms summarized in Table 2 clearly illustrate 
their boundedness. Based on the above results, we can conclude that the 
profile sampler is a second-order frequentist valid procedure. 

5.2. The proportional odds model with right- censored data. The survival 
function in this example is parameterized such that the ratios of the odds 
of survival for subjects with different covariates are constant with time: the 
conditional survival function Sz{u) of the event time, T, given the covariate 
Z satisfies —\og\i{Sz{u)) = \ogri{u) + where logit(y) = log(y/(l — y)). 



Table 2 

Simulation results for confidence intervals 



n 


n|MLE- CM| 


VTrisEM - senI 


n|LBM — LBn| 


n|UBM - UBnI 


20 


0.6541 


0.5027 


0.1920 


2.2823 


50 


0.3062 


0.2270 


0.1809 


1.1212 


100 


0.2587 


0.0311 


0.5987 


0.1301 


200 


0.3218 


0.0279 


0.4810 


0.5253 


500 


0.2017 


0.2080 


0.7524 


0.3518 



LBm (UBm), lower (upper) bound of the 95% confidence interval based on MCMC; LBn 
(UBn), lower (upper) bound of the 95% confidence interval based on numerical derivative. 
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v{y] 



{r]{y) +e ^^)(??(y-) + e" 



-ze 



rj{y) + e 



where ?7{y} is the jump size in r/ at y. The score function for 9 is 



-z 1 



r]{y) + e rj{y—) + e , 

The score function for 77 via the direction of bounded function h G L2{r}) is 

As,K.) = {d/dt%^,le,,. = 5hiy) - ^^^^ - ^^^^^ 

where drjt = {1 + th) drj. Let A*0 ,^ denote the adjoint oi Ag^rj- Then, A*0 .^Aq ^rih{u) 
is the information operator for the nuisance parameter -q when 9 is known. It 
is shown to be continuously invertible on the space of functions of bounded 
variation on [0,r] in Lemma 4.3 of [19]. Hence, the least favorable direction 
/iQ is defined as (^4^^ -;yo^^o,»?o)~'^^9o »7o^^o,»?o- "^^^ form of the information op- 
erator and A*Q ^ can be found in [19]. By setting di]t{9, t]) = {1 + {9 — t)/io) drj, 
we can obtain i{t,9,r]) =loglik{t,r}t{9,r])). Hence, the maps in assumption 
4 can be derived as follows: 

,-zt g^-zt 



i{t,9,ri){x) 



-z 1 



£{t,9,7i){x) 
Wt\^{y,z) 
et,e{t^9,rj){x) 
Wt',e,,{y,z) 
l^^\t,9,r^) 



-6 
-6 



Vtiy) +e ^* 



+ 



'nt{y-) + (^ ^* 
lo K dy 



l + {9-t)ho{y) vt{y) + e-'' 
ho{y? 



+ 



{l + {9-t)ho{y)) 



'nt{y-) + e 

^+W^^s^^{y,z) + 6W,y^^{y-,z), 



Uo ho dr] + ze- 



-zt\2 



z e 



-zt 



{vt{y) + e 



-zt\ 



(??t(y) + e-"*)2 



{l + {9-t)hoiy)y 

ho d-nize"''^ + ho drj) 



+ Wle^^{y,z) + 6Wr,e,r,iy- 



(7?i(y) + e-'*)2 

6hl{y) 
-'^{l + {9-t)hoiy))^ 

2{ze-'' + jyhodrjf 
(r/t(y) + e-^*)3 



+ Wt'^0,v(y^^) + ^Wt',e,r!{y-,z) 



z e 



-zt/ 



(e-^t + r?f (y))(2ze-^* + 3/(f hp dy - zrjtjy)) 

ivtiy) + e-""^)^ 
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Under the regular conditions, the above maps are continuous and uniformly 
bounded around (^Oi^Oi^) by the same reasoning as was used in the first 
example. We also know that Wfj^ — 7?o||oo = Op(n~-^/^ + |6'„ — 9q\) by Theo- 
rem 3.1 in [21]. By similar techniques to those used in the first example, we 
can easily verify assumption 5. The following lemmas verify the remaining 
conditions. 

Lemma 4. Under the above set-up for the proportional odds model, as- 
sumption 6 is satisfied. 

Lemma 5. Under the above set-up for the proportional odds model, con- 
dition (12) is satisfied. 

5.3. Case-control studies with a missing covariate. The third example is 
a logistic regression model for case-control studies with a missing covariate 
considered by [23] and [24] . We observe two independent random samples of 
sizes 71(7 and nji from the distributions of {D, W, Z) and {D, W), respectively. 
Following the assumptions concerning the distribution of the random vector 
{D, W, Z) in [24], we can construct the likelihood for the vector {D, W, Z) in 
the form p0{d,w\z) dr]{z) , where 

(27) peidMz) = (S7./3(^))"(l - ^,A^))'^'-^( '' ~ "° ~ 

H-y^^(z) = (1 + exp(— 7 — /3e^))~^ and dij denotes the density of rj with respect 
to some dominating measure on Z cM}. 

We assume nc = nji so that the observations can then be paired. Here, 
we denote the complete sample components by Yc = {Dc, Wc) and Zc and 
the reduced sample components by Yr = [Dr, Wr). Thus, the likelihood is 
defined as 

(28) \\k{e,if}{x)=pe{yc\zc)il{zc} j Pe{yR\z)dri{z). 

The unknown parameters are 9 = (/?,ao;CKi;7)Cr), ranging over a compact 
C X (0,00), and the distribution rj of the regression variable which is 
restricted to the set of nondegenerate probability distributions with support 
within a known compact interval. We will concentrate on the regression 
coefficient /3, considering 62 = (aO)Q^i) 7)0") ^-nd r/ as nuisance parameters. 

We start by introducing the least favorable submodel. The score function 
of 9, ig^r), is the summation of the score functions for the conditional density 
Pe{yc\zc) and that for the mixture density Pe{yR\r]), given as follows: 

« / I N d A p ( \ I ^e{yR\z)pe{yR\z)d7]{z) 

ie[yc\zc) = ^, 7 — , 7 and £g (yR) = — — . 

d9logpe{yc\zc) PeKyRW 
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Furthermore, by defining dijt = (1 + th) dr], wliere h is an arbitrary bounded 
function satisfying J hdij = 0, we can obtain the score function for r] in the 
direction h, 

Ae,^ri[xj — n(zc) H — . 

By the projection principle discussed in [24], we thus define the least favor- 
able submodel as follows: 

where OtiO^rj) = 9 - (/? - t)ao, dr]t{9,ri) = (1 + (/3 - t)aQ{hg - r]ho))dr] and 
Oq = (1, — /o,i2(-^o,22)~^)- The efficient information matrix Iq can be decom- 
posed into four submatrices corresponding to parameters /3 and the group 
(ao,ai,7,cj). /o,ij corresponds to the (i,j)th block of Iq for z = 1,2 and 
j = 2. In Section 8 of [23], the least favorable direction at the true value, /iq, 
is proved to be a bounded and Lipschitz continuous function. 

Let {92^(3, fjfj) be the profile likelihood estimator for (6*2,??) when /? is given 

so that = (/?,^2,/3)- [21] showed that 

(29) \\fi^^^ - r/ollsLi + - ^oll = Op{\Pn - Po\ + n"^/^) 

for any /?„ consistent for Pq. The norm applied to the function r] and vector 
9 is the weak topology norm and Euclidean norm, respectively. The weak 
topology norm on r] is defined as = sup^^BLi \ I h{z) dr]{z)\, where 

BLi denotes the set of all functions h:Z^ [—1,1] that are Lipschitz norm 
bounded above by 1, that is, \h{zi) — h{z2)\ < [[^i — Z2\\z- The following 
lemmas verify the remaining conditions. 

Lemma 6. Under the above set-up for the case-control model, assump- 
tions 4-6 are satisfied. 

Lemma 7. Under the above set-up for the case-control model, condition 
(12) is satisfied. 

6. Discussion. Our theory ensures second-order frequentist correctness 
of the profile Bayes analysis for the finite-dimensional parameter. The nec- 
essary and sufficient conditions required for third or higher order frequentist 
inference need to be constructed in order to complete general higher or- 
der semiparametric frequentist inference theory in the future. Our future 
work could also include extending our methods to semiparametric mod- 
els with slower convergence rates for the nuisance parameter, for example, 
11^0 ~ %ll = Op{n~^/^ + ll^n — ^'oll)! as happens with the Cox model for cur- 
rent status data. The conjecture in Remark 6 implies that the Op(n~^/^) 
rate in Theorem 2 is sharp. Hence, to show this conjecture may be a future 
research goal, although it appears to be very challenging. 
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7. Proofs. 

Proof of Lemma 1. We first show the following no-bias conditions: 

(30) Pi{6o,9n,rigJ = Op{n-^/^ + {On - ^o|)', 

(31) P£{6o,9n,figJ = Ph + Op{n-^'^ + - ^oD- 

(30) can be written as PliO^, 9n, VgJ- pIq = [P£{9o, 9n, VgJ- Pi{9o, 9o, v^J] + 
[P£{9Q,9o,fjg ) — Pio\. The second square bracket is bounded by Op{\\fjg — 
?7o|P)) by (5). By the ordinary two-term Taylor expansion, the first square 
bracket equals 

i9n-9o)id/d9)\e=eoPi{9o,9,f,J 

+ il/2)i9n-9of X idyd9^)\g=g,Pi{9o,9,fiJ, 

where 9* is an intermediate value between 9n and ^o- The second term of 
this expansion is of order — 6*0^, by assumption 4. We now consider the 
first term. Define rj ^ L{r]) = {d / d9)\e=eoPt{9o, 9, rj). Then, L{fi^J - L{r]o) = 
OpiWfjfj — % ID) by (4) in assumption 6. Combining this with the fact that 

L(77o) = 0, we have {9n - 9o) x {d/d9)\0=0,Pi{9o,9,fjJ = Op{n-^/^ + \9n - 
9q\)'^. This completes the proof of (30). By assumption 3, the smoothness 
conditions on £{t,9,r]) and (3), we can also show (31) using similar analysis. 

Recall that io{X) = ^o(^)- It then suffices to show (6) if GnV^{£{9o, 9n, Vq )~ 
io) = Op{^\9n - 9o\ + 1). Note that, by (2), G„V^(£(eo,4,%J - 4) = 
\/n(6'„ - 9o)Gnit,e{^o,9n,fleJ + V^Op(ll??e„ - ^ll), where 6*; is an interme- 
diate value between and 9^.- Combining this with assumption 5, we have 
proven (6). Considering assumption 5 and (31), we can prove (7). □ 

Proof of Theorem 1. We first show (9). Note that = Pn^(^n, 9n, fin) = 

^Ji9o, 9n, fin) + {9n " 0o)Pn^'(^O, 4,>) + ((^n " ^o) V2)lPn^(') (^^ , 4, fin), where 

0* is intermediate between and 9n- By considering Lemma 1 and assump- 
tion 5, we construct the following equation about (^„ — 9q): = 

n~^m=iio{xi) + {9n - 9o)P£o + Op{n~^). This completes the proof of (9). 
To prove (8), we first show that 

n 

\ogpU9n) = logpU9o) + i9n - 9o)Y,io{X,) - ^(a„ - 9o)^io 

(32) 

+ Op(n|4-^nP + n-i/2) 
for any satisfying {9n — 9n) = op(l). Note that 

n-^(logpl„(^n) -logp/„(eo)) =Fn^(^n,^n,r/eJ -Pn^(0O,^O,^?0o)- 
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The right-hand side of the above equation is bounded below and above 
by Fn{i{9n,ipn) — ^(^Oj^n)); where the lower and upper bound separately 
correspond to ipn = {Go, VOo) and fj^ ). We then apply a three-term Taylor 
expansion to both upper and lower bounds. By considering Lemma 1 and 
assumption 5, we find that the upper bound and the lower bound match at 
the order of Op{n~^^^ + \ dn — ^n\^) ■ We have thus proven (32). By replacing 
On with 9n in (32), we have 

n 

i=i ^ 



The difference between (32) and (33) gives (8) by considering (9). □ 
Proof of Theorem 2. Suppose that F„(-) is the posterior profile dis- 

~l/2 

tribution of \fnQn w.r.t. the prior p{Q), where Qn = {d — On)Io • "^^^ whole 
proof of Theorem 2 can be briefly summarized in the following expression: 

For the denominator, we first prove that the posterior mass outside I^Pnl !^ 
r„ is of arbitrarily small order, where r„ = o(n~-^/^) and y/nrn oo. The 
mass inside this integration region can be approximated by a stochastic 
polynomial in powers of n~^/^ with an error of the order Op{n~^). The 
numerator can be analyzed similarly. Finally, the asymptotic expansions 
of both numerator and denominator yield the quotient series, which is the 
desired result. We first state some lemmas before the giving formal proof of 
Theorem 2. 

Lemma 2.1. Let r„ = o(n~^/^) and y/nrn — > oo. Under the conditions of 
Theorem 2, we have 

~— 1/2 

(34) / piOn + QnlQ ) —J— dQn = Op{n ). 

"'kn|>''n pinion) 

Proof. Fix r > 0. We then have 

f la 1 f~l/2sPln{0n + qJo^^^) , 

/ Pi0n + gnlo ) TTTV 

■J\g„\>r pinion) 

</{A;<-n-i/2}exp(-V^) / p(0)de + /{A;>-n-V2}, 

Je 



(33) 



logpZ„(^n) = logpln{Oo) + {6n 

+ Op(n-i/2). 
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~— 1/2 

where A!^ — sup|g^ l^j. ^nifin ~\~ Qn-^Q ) • By a minor revision of Lemma A.l in 
the Appendix of [17], we have /{AJ^ > — n~^/^} = Op{n~^). This imphes that 
there exists a positive decreasing sequence r„ = o{n~^^^) with y/nvn — > oo 
such that (34) holds. □ 

Lemma 2.2. Let = o{n^^^^) and ^/nvn — > oo. Under the conditions of 
Theorem 2, we have 



(35) 



— p{On + QnIo )-CXp p(^„, 

pln{0) V ^ 



dQn 



(*) 



Proof. The posterior mass over the region |^ri| — is bounded, by 



\0n\<rn 



pin{e) 



n 



-p(6l„,) -exp( - '-^qI ]p{en) 



dQn 



+ 



(**) 



\en\<rn 



Pln{0n + Qnk^'^) f-1/2. 

— pyPn + Qn^Q ) 



Pln{0) 



plnjOn + Qnlo ) 

pU0) 



P{On] 



dQn- 



Using (8), we obtain 



(*) 



■ n 



\en\<r„ 
1/2 



p(^.„)exp 



nQn 



exp(Op(n|£>„|3 + n-i/2))_i 



dQn 



\Un\<y/nrn 



p(e„)exp(-^ 



x\eMn-^'^{\un\' + l)Op{l))-l\ 



dUn 



X Op(l) X 
:Op(n-i), 



\un\<\/nrn 



p[9n)e^v[-^]{\Un\' + l) 



dUr. 



where the second equality follows by replacing \fnQn with Un and the third 
equality follows from the fact that | exp(n~^/^(|M„p + l)Op(l)) — 1| = 
Op(l)n~^/^(|u„P + 1) since |n„| < y^r„, and r„ = o(n~-'^/^), that is, Un = 
o{n^^^). By the following analysis of (**), we can also show (**) = Op{n~^) 
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(**) 



'l£'n|<r'„ 
<M 



— 1 /2 

\QnIo p{ei)\eyiv[-'-Qi + Op{nQ-'^ + n 



n 



-1/2^ 



dQ„ 



l£'7i|<''n 



n 2 



dQn X sup exp(Op(n£>^ + n ^^^)), 

l£in|<r„ 



— 1/2 

where ^* is an intermediate value between 9n and On + Qnio ■ '-' 



We next start the formal proof of Theorem 2. First, note that 



Pln{0n) 



1/2. 



\en\>rn 



P[On + Qnio ) TTTT" 

Pln[0n) 



1/2. 



dQn 



+ 



\Qn\<rn 



P\Pn + QnlQ ) — 

Pln{0n) 



dQn- 



By Lemma 2.1, the first integral on the right-hand side is of the order 
Op{n~^). The second integral on the right-hand side can be decomposed 
into the following summands: 



/ p{On + QnIo — exp 

^|en|<r„L pln{9) V 2 y 



dQn 



+ 



\en\<rn 



n 



exp( -'-qI ]p{en) 



dQn- 



The first part is bounded by Op (n ^) via Lemma 2.2. The second part 
equals 



n~^/^p{en) 



\un\<Vnrn 



+ 00 



dun = n~^l^p{Bn) \ e-""/2 dUn + ©(n^^), 



where Un = \/nQn- The above equality follows from the inequality that 
\T e"^'^^ <^V ^ 2;"^e-^''/2 for any x > 0. 
Consolidating the above analysis, we have 



+ 00 



(36) 



~— 1/2 

fo , f^l/2^ pln{0n + Qnlp ^ "' 

p{9n + QnIo ) TT^T 

Pln{0n) 

-n-^''^p{en)V2^ + Op{n-^) 



dQn 
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and, by similar analysis, we obtain 



/ P[On + Qn^o ) — 



■ n 



^l^p{e^) / e-y l^dy + Op{n-^] 



The quotient of (36) and (37) generates the desired result, (13). This com- 
pletes the proof of Theorem 2 in its entirety. □ 

Proof of Corollary 1. Prom the proof of Theorem 2, we have 
Pe\x{V^{e-k)~C<i) 

_ Pj^n + gJo^^^)iplniOn + gjp^^^)) / jplnjOn)) dQn 

Pi0n + Qnio^'\pln{en + £'„/o"'^'))/(P^n(4)) dQn ' 

By differentiating both sides relative to and combining with (36), we obtain 

Jn[t.) = 



27rp(0„) + Op (n-V2) 

Based on (8), the numerator in the above equals p{6n) exp(— ^^/2) + Op(n~^/^) 
by some analysis. This completes the proof. □ 

Proof of Corollary 2. The expansion in (19) is the quotient of two 
expansions of the form (36) and (37). We can see this as follows. First, 



I-^ QnP0n + gJo ^''^){pln{On + Qnlp ^''^)) / {plnjOn)) dQn 
P{On + Qnio^'^plnien + qJo'^^)) / iplniOn)) dQn ' 



The denominator is n' 

-i/2V2^yo(^n) + Op{7i^^) by (36). Similarly, by the 
proof of Theorem 2, we know the numerator is n~^^'^^^^'^ p{6n)V^EU^ + 
Op(n-(^+2)/2), equivalents, (2/n)(''+i)/2r((r + l)/2)p(^„) + Op(n-('^+2)/2), 
where U ~ iV(0, 1). Obviously, the quotient is n-'^/^EU'^ + Opln''^''+^/'^). If 
r is odd, the quotient is simply Op{n~^'^^^/'^). □ 

Proof of Theorem 3. We first show that for any ^ G (0, i) and ^ < 
a < 1 - 

(38) Tna = 0n + ^^ + Opin-^). 
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Implicit in (13) is an expansion of r„a in terms of z^- First, we set t„q, = 9n + 

Za/ \l nio + rn and we can then show r„ = Op{n^^). Plugging r„Q, into (13), 

we obtain a = pQ^j^{y/n{6 — On)!^^'^ 1^ Za + \'^il^^rn) = a + Op{n~^^'^) + 

y/niy^rnfniTna)^ where r*Q, is between Za and Za + ^/nll^'^rn- The first 
equality comes from the definition of r„Q,. The second equality follows from 
Taylor expansion and (18). We can now deduce from these two equalities 
that rn = -Op(n-i)Io"^/^($«„) + Op(n- V2))-i = Op{n-^) based on (18). 
Note that r„ is well defined since fniTna) strictly positive when ^ < a < 1 — 
^. This completes the proof of (38). Next the classical Edgeworth expansion 

implies that P(n-V2 ^[^^ ^(Xi)/~^/^ < z„ + a„) = a, where an = 0{n-^/^), 
for ^ < a < 1 - e Let k^a = zj^^^^ + {^{9,, - 9o) - ^ E^i h{X^)io^) + 

aJ-^'\ Then, P{^{9n - 9o) < ^„,) = P(n-i/2 ^^i 4(^^)4"'^' < + 
an) =a. Combining (38) and (9), we obtain kna — '^na 

Proof of Lemma 2. We first compute the Frechet derivatives oiit,e{^) 
around {9q,9q,Kq) by means of As(y) = A(y) + s hdK = K{y) + sWK{y), 
where h{-) is an arbitrary bounded function. The corresponding Frechet 
derivatives follows: 

, ry 



Jo 

The operator it,e.AiW\) is linear and continuous by the inequality 



(39) |VA(VFA)-£t,e,A(^A)| 



< 







\od{WA-VA) 



<\\Wa-Va\ 



almost surely, since A is a cumulative hazard function with support [0, r]. It 
is also a bounded operator since we can replace V\ with zero in (39). Note 
that it,8, Ai^) = by its linearity. By similar reasoning, we can also know that 
^t,t,A{WA) and £t,A(WA) are both linear, continuous and bounded operators 
when (t, 9) is in some neighborhood of (^O) ^o) and AgTC. The boundedness 
of the above two operators ensure that P{it,e{do,(^o, -A) — ^t,e{0Oi(^Oi -^o)) = 

Op(IIA-Aolloo) and P(^'(0o, ^o, A) - ^'(^o, ^o, Ao)) = Op(||A '- AolU) when 
A is in some neighborhood of Aq. To verify (5), we need to show that A 
lik(0O)A) is second-order Frechet differentiable around Aq. To this end, the 
first derivative is 

while the second derivative is 

pdoz 
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X {e^'''K{y}VK{y)WK{y) 

- e'°'WK{y}VK{y) - e'^''V^{y}W^{y)f . 

Clearly, likA(WA,VA) is a bounded bilinear operator. Its continuity follows 
from the continuity of the maps W\ i— > likA(WA, •) and V\ ^ likA(-, Va)- 

Next, we need to show that G„(i(6'o, ^o, A) - 4) = Opdl A - Ao||oo)- First, 
note that the class of functions of {z,y), 

{ze^«-~(Ml{A(y) <t}- Ao{y)) : || A - AolU < 7, * e [0, r]}, 

is a VC class for each 7 < co. Since A is monotone and bounded by M < 00, 
we now have that the class {ze^°^{A{y) — Ao(y)) : ||A — Ao||oo < 7} is a VC- 
hull class for each 7 < 00. Since koj is an envelope for this last class for some 
/cq < 00 that does not depend on 7, we can use Theorem 2.14.1 in [29] to 



obtain that 



ize 



(A(y) 



can be used to verify that C 
Thus, G„(^(eo,eo,A)-4) 



- Ao(y)) = Op(||A — AqIIoo)- a similar argument 
,e''^'jyho{s){dA{s)-dAo{s)) = Op{\\A-Ao\\oo). 
= Op(|| A — Ao||oo)i as desired. □ 

Proof of Lemma 3. The proof of Lemma 3 is analogous to that of 
Lemma 4 in [17], which is for the more general odds-rate model. □ 

Proof of Lemma 4. The proof of Lemma 4 is analogous to that of 
Lemma 2. We can similarly verify the linearity, continuity and boundedness 
of it^niWrj), (^t,e,ri{Wrj) and it,t,ri{Wrj), whose concrete forms can be found in 
[3]. The verification of (5) also follows similar reasoning as used in the proof 
of Lemma 2. The forms of lik^(W^) and lik^(W^, V^) are specified in [3]. By 
analysis similar to that in the proof of Lemma 2, we can show (2). This 
completes the proof. □ 

Proof of Lemma 5. The proof of Lemma 5 is analogous to that of 
Lemma 4 in [17], which is for the more general odds-rate model. □ 

Proof of lemma 6. Before we start the proof of Lemma 6, we first 
present the following necessary computations according to (28). 



(40) ie{y\z) 



z{w — a.Q—aiz) 

7^ 

xp(7 + ,3e^) + l + {'^ ~ 1) 
1 I (m-"0-Ql-)^ 

"I 



(41) ie{y\z) 



cxp(7 + /3e'^ )e 
■ (l + exp{7 + /3e^))^ 









^ 1 _ 3 






exp(7+/3e^)e'^ 
■ (l+oxp(7+;3e^))^ 



exp(7+/3e') 
' (l+oxp(7+/3e'=))^ 



75 

2z{w — qq —aiz) 





3(iti — ao —aiz}^ 
^3 



HIGH ORDER SEMIPARAMETRIC INFERENCE 25 

where ie{y\z) = d"^ / dO"^ \ogpg{y\z) . We now compute {t,9,ri) i-^ Q^+^t-/ 
dt'-d6"^i{t,6,r]), with the abbreviations 6t = 9t{9,ri) and rjt = rit{6,-r]), for 
(/, m) = (0, 0), (1, 0), (2, 0), (3, 0), (1, 1), (1, 2), (2, 1), as follows: 

e, v) = 4(4 {yc\zc) + ieuvt (vr) - ^e,,,, (/?, 

e{t, 9, 77) = (^4 {yc\zc)ao + ^4,r,. (^ij) - ^^e,,,, (G^" ; 



i j k ■5'' 



9=e* i dt 



d ■ 

For brevity, we omit the complete versions of the above formulas and refer 
the interested reader to [3]. However, we present the complete description 
of i{t,0,ri) in the following to illustrate some functional properties of the 
above formulas, which will be used in the proof of Lemma 6: 

rd ^ , . t{ Vtif-etPet) + VtiieJJ^Pet) - ??(4-f^o(^)^Pet 
- -At,vt[yR) = % 



H X (-ao 

mPet VtPet 



and 



l + {f3-t)4Ho{z)Y vtPe, 

i]{Ho{z)pgJ r]t{Fp0,) -ri{Ho{zfpe,, 

X >ao. 



VtPet VtPdt 

In the preceding, Ij is a five-dimensional vector with the zth element one, 
and the others zero, 9j is the jth element of the vector 9, ai is the ith element 
of vector oq, = a^aj, aijk = aiCLjCLk and po^, io^ and io^ are respective 
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abbreviations for pQ^{y\z), lQ^{y\z) and i0^{y\z). is the transpose of L and 
Hq{z) = hQ{z) — 77/10. Note that Hq{z) G C\{Z) with zero mean after proper 
rescahng. A0^^ri^{-) is an abbreviation for Aq^^^^ {{hQ — r]hQ)/{l + {(3 — t)aQ{hQ — 
r]ho))){x). iot^nti'i'j} ™d Ag^^^iti'^j} are the respective (z,j)th elements of 
square matrices {d/dt)£g^^r]t{yR)0'o{^oO'o)~^ and {d/dt)Ag^ j^^{-)a'Q{aoaQ)~^. 
The above notation is vahd for i,j,k = l,...,5. We need the following two 
lemmas to verify assumption 4. □ 



Lemma 6.1. Given z in some compact set Z, 6 and r] in some neigh- 
borhood of 00 and ijq, respectively, we have 

ce{w) j Pe{y\z) dr]{z) 
where ce{w) = Moexp{^^){\w\ + 1) and < Mo,M < 00. 

Proof. Note that | \w\ — \ao + aiz\ \ <\w — oq — a\z\ < \w\ + |ao + , 
and z is in some compact set. Thus, we have the following inequalities: 

/.ON f HM2 \ ^ Pe{y\z) ^ f\w\Mi 
(43) exp <7 — / 1 n ^ / n ^exp( 



and 
(44) 



2ct2 J - Jpg{y\z)dr^{z)- "V 2a^ 



d f Pe{y\z) \ . 



dz \Jpe{y\z)dr]{z) 
where Mj is some positive finite number, i = l,2. □ 

Lemma 6.2. Let hg{y\z) = Ylf=ogi{z;a,-f,P){w - oq - a\z)^ for 6 eQ, 
where gi{z;a,^, (3) G C\{Z) and is continuous w.r.t. 9 for I = 0,1, . . . , L. The 
following then has an integrable envelope function in Lk{P) and is contin- 
uous at (^,7/1,772) when 9 is in some neighborhood of 9q and r]i is in some 
neighborhood of rjo for i = 1,2, and where K is any positive integer: 

(..^ fh _ I heiy\z)p9{y\z) dr]i{z) 

^ ^ •^^'"^'''^^^^ " IPe{y\z)d^,{z) ■ 

Proof. The following is the envelope function for fg^^ ^2^^)' ^^iv)'- 



! Pe{y\z) dri2{z) 
<X:(H + iyexpfM^)^F'^(,). 



1=0 
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In the above, the first inequality follows from (45), the second one fol- 
lows from (43) and < cJmin < o" < (Jmax < c«. Next, we only need to show 
P\F^{y)\^ <oo for any positive integer K. Accordingly, 



p\Fe\yr<j:f PfE(iH+i)') 



K 



X exp 



(KMi 
2^ 



w\ ]pgQ{w,d = i\z) dw dr]o{z) 



Eir(i:(iH.i)fe.p(fMLH) 



X exp 



(lull - |ao + ai^l)^ 



lot.. 



d'wdr]Q{z) 



1 

1=0 

< CX), 



+ 00 



E(|u;| + 1)M exp 
vi=0 / 



(IH - Ms)- 
2rj2 

max 



dwdijQ^z) 



where M3 is some positive finite number. The second inequality follows from 
the inequality | l^l — \ao + aiz\ \ <\'w — oq — a\z\. 

It is trivial to show that ^r]x,r]2^v) continuous at given {r]i,r]2) is 
close to (T/o,r/o), since pe{y\z) and hg{y\z) are both continuous at ^0 for P- 
almost every Y. Next, we need to show /^^^ r)2^y^ continuous at (r/o,%) 
for fixed 6 around ^o- Accordingly, 



\fe iy) - fe ,1110,1120 iy)\ 



< 



< 



ivi - mo) [he 



Pe \ 



mpe 



(??i - mo) 



Kl=0 



moPe 
V2Pe I 



KmPeJ 



^ \Vio9i{z;cr , 7, P) {w-ao- aizYpe] 
1=0 



V2oPe 



{V2 - V20) 



Pe 
V2Pe 



<EH' 

1=0 



im-vw)[Gii 



+ (|u| + 1 



jVioPe 



moPe 



{V2 - mo 



Pe \ 
mPe) 

Pe \ 



mPe 



< Ki{w) X ||?/i - ryiollsLi + K2{w) x \\ri2 - moWsL^, 
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where 

and where hg and p$ are abbreviations for h0{y\z) and pg{y\z), respectively. 
The second inequahty fohows from 

L L 

he{y\z) = ^gi{z;a,-/,P){w - ao - aizY = ^Gi{z-e)w\ 

1=0 1=0 

where Gi(z; 6^) = ^^^^ ^^(z; a, 7, /3)(-qo -aiz)'=-'(/c!/(/!(/c- /)!)). It is trivial 
to check that X^i^i ^i/j(-2^)5i('2)/2 G Cj(Z) if 7,(2;) and 5(4(2;) belong to C\{Z). 
Since the WiS are nonnegative weights which sum to one, we can find a 
positive number R such that R~^Gi{z;9) S C\{Z) for <l < L. The last 
inequality follows from Lemma 6.1 and (43). Note that both Ki{w) and 
K2{'w) are bounded in Li{P). This completes the proof. □ 

Verification of assumption 4. By repeatedly applying Lemma 6.2, 
we can check the continuity and boundedness conditions in assumption 4 by 
resetting hg{y\z) equal to aQig{y\z), aQig{y\z)ao and aQig{y\z)Ho{z)'^ao. 
□ 

Continuing with the proof of Lemma 6, we need the following verification 
of assumption 5 (which requires Lemmas 6.3 and 6.4 below). 

Verification of assumption 5. Lemma 6.3 is proved in [27]. The 
more general version of this lemma can be found on pages 158-159 of [29]. 
We know the random variable d is binary and thus not smooth. But, if 
the classes of functions obtained by fixing d to either or 1 are both P- 
Donsker when viewed as functions of the remaining arguments, then the 
entire classes are P-Donsker. A more formal statement of this result can be 
found in Lemma 9.2 of [23]. Thus, we consider the classes of functions in the 
following two lemmas for d = and d=l, respectively. 

Lemma 6.3. Let X = [J^^Ij be a partition o/M^ into bounded, convex 
sets whose Lebesgue measure is bounded uniformly away from zero and in- 
finity. Let Q be a class of functions g:X^M} such that the restrictions g\i- 
belong to C]^, for every j . Q is then P-Donsker or P-Glivenko-Cantelli for 

every probability measure P on X if and only if Y^fLiNjP^/'^{Ij) < 00 or 
Jl'jLi^jPi^j) < 00, respectively. 
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Lemma 6.4. (46) below is P-Donsker when 9 is in some neighborhood 
of 00 and {riij,r]2j) is in some neighborhood o/(?7o,?/o) over compact support 
Z for j = 1, 



.hi 



,k. The form of fg^^_^, j^^.{y) is given in (45) and 



(46) 



^^(y;^,r/i,%)^n^X,,.,(2/)> 



where H = {h0i{y\z),h02iy\z), ... ,h0k{y\z))^ , rji = (r/ii(z), . . . , r/ifc(z))^, 
{r]2i{z),...,V2k{z))^ and he^{y\z) =Y.iiQ9ij{z;cF,-i,[3){w - ao - aiz)\ 
where gij{z; £7,7, f3) G C\{Z) for I = 0,1, . . . , Lj and j = 1, . . . ,k. 



m = 
and 



Proof. Without loss of generality, we assume d = 1 in the following 
proof. Based on (43), we have, in each Ij = {j — I < \w\ < j} , j = 1, . . . , k. 



(47) 



and 



1/^ 



Vl3,V2j 



(y)|<i:(M + l)'expfM|l 



a 



^ Vijila^^ejlPe) ^ yij {\h0jg^ log pe\pe) 



(48) 



V2jPe mjPe 
, Vij{\hej\pe) ^ V2j{\^logPe\pe) 



172 jPe 



mjPe 



<£,H,,y(„p(M^). 



/=0 



exp 



\w\Mi 



a" 



From the above two inequalities, we have that \{d / dw)L^ {y;9 ,fji,T]2)\ is 
bounded by some constant times YjI=o {j + 1)' (6xp(jMi k / 2cr^ ) + exp(jMi {k + 
l)/2cj^)), where R = \ + J2'j=i^jy ™ each Ij, j >1. We can then apply 
Lemma 6.3 to the function w ^ {y, 9, V1jV2) with d = 1 in each Ij defined 
above. Since the tails in of P are sub-Gaussian, the series J2jiJ2iLo{j + 
l)'(exp(jMifc/2cr2) + exp(jMi(A;+ l)/2o-2)))P(j - 1 < |w| < j)V2 is conver- 
gent. Thus, we prove that (46) is P-Donsker, which is trivially P-Glivenko- 
Cantelli by Lemma 6.3. □ 



Continuing with the proof of Lemma 6, we next apply Lemma 6.3 and 
Lemma 6.4 to show that x £{t,9,r])(x) is P-Donsker when (t,9,ri) is 
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around {/3o,OQ,r]o). The first term of £{t,9,r]), a'Q£0{y\z)aQ, is P-Donsker, 
provided the following are both P-Donsker, for < r, s,t < oo in (50): 



(49) /(z):zi 



(50) gr,s,t{w)-W ^ 



exp(7 + (5e^)e^ 
(l + exp(7 + /3e^))2^ 

z'^{w — ao — aizY 



(49) is trivially P-Donsker since the function u ^ nexp(7 + /3«)(1 + exp(7 + 
j3u))~'^ is Lipschitz continuous, where u = e^ \s P-Donsker. For (50), we need 
to consider Lemma 6.3. We have that \{d / dw)gr^s,t{w)\ < (j + |ao| + lo^il)'^""'^ 
when i — 1 <\w\ < j . Since the tails in of P are sub-Gaussian, the series 
J2j j^~^PU — < j)^^'^ is convergent. We have thus proven that the 

first term of x i— > i{t,0,i]){x) is P-Donsker. By setting hg(y\z) in Lemma 6.4 
equal to aQ£g{y\z), aQ£0{y\z)ao or a]^HQ{z), we can show that the remaining 
parts of X £{t,9,r]){x) are also P-Donsker. It can also be proven that 
X ^ £t,e{t,9 ,vi){x) is P-Donsker and that x ^ £^^\t,9,r]){x) is P-Glivenko- 
Cantelli by similar reasoning. Thus, assumption 5 is satisfied. The proof is 
complete since Lemmas 6.5, 6.6 and 6.7 below verify assumption 6. □ 

Lemma 6.5. (2) holds when rj is in some neighborhood ofijo. 

Proof. Based on the form of i{9,9,r]), we can prove (2), provided 

(51) GMe,,,,iy)-f9,,o,mM = Op{\\v-m\\BL,). 

Note that h0{y\z) = J2iLo9ii^y /3)('"^ — ao — ol\z)^ for G 0, where gi{z\ cr, 
7, /3) G C\ (2:) for / = 0, 1, . . . , L. Thus, (51) will hold, provided 

(^52) 1 9{zWvd,Ay\A<^'f]{z) J g{z)w'^p0^{y\z)dr]o{z) 



Jpeo{y\z)dr]{z)\\r]-r]o\\BLi I Peo{y\z) d7]oiz)\\r] - VoWbLi' 

for g{z) ranging over Ci{Z), is P-Donsker for / = 0, 1, . . . ,L. Without loss of 
generality, it will be enough to verify this for d=l. Note that (52) can also 
be written as the sum of Qeo,rio,'q{w) and —Rgf^^rio,riiw), where 

r, ( ^ ! 9{z)w''peQ{y\z)d{i]-r]o){z) 



jpeo{y\z)dj]{z)\\r}-7^Q\\BLi 
and 

j g{z)w^Peo{y\z)driQ{z) I P9o{y\z) d{v - Vo){z) 



Rf) 



I Peo{y\z) drjoiz) J Peo{y\z) d'n{z)\\r] - rio\ 
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We apply Lemma 6.3 to prove that Qeo,no,n^'^) -P-Donsker: 

\{d/dw)Qeo,noA'^)\ 



< l\w 



-1 , |„,|«l(^-%)(m^logP9o> 



< 



+ \w 
l+l 

E 

m=l—l 



VoWbLi 



+ \w\ 



vpeoWv-mWBLi 



Vo)i9Peo)\ ^ viPeol^logPeol) 

Ti ^ 



WdoWV-VoWBLi VPeo 
\{V - Vo){sm.iz)peo/{VP9u))\ 



w\ 



IV - rjollBLi 



where g and pg^ are abbreviations for g{z) and po^d/lz), respectively, and 
Sm{z) G C\{Z) for m = Z — 1, Z, / + Combining this with (43), \ {d / dw)Q0Q^rio^ri{w) 

is bounded by a constant times exp(jMi/2(T^) X]m=z-i .^"^ ™ each region 
Ij = {j — 1 <\w\ < j}. It is thus proved that Qeo,'qQ,ri{w) is P-Donsker, by 
Lemma 6.3. Similarly, we can also show that Re^^,rjQ,r){w) is P-Donsker. This 
completes the proof. □ 

Lemma 6.6. (3) and (4) hold when r] is in some neighborhood ofrjQ. 

Proof. Based on the form of £(9, 9,?]), (3) will follow provided 

v{g{z)w^Peo) Vo{9{z)w^Peo) 



(53) 



P 



Op{\\r]-riQ\\BLC, 



VPeo mPOo 
for any g{z) G C\{Z) and for / = 0,1,..., L. Now, (53) is bounded by 
the summation of P\Qg^^r,o,rii'^)\ and P\Rg^-^^r]o,r|{w)\■• where Qeo,vo,v(''^) = 
Qeo,vo,vi'^)\\v " 'HoWbLi and R0g^r,o,vi'^) = ^eo,vo,rii'^)\\'n - VoWbLi, and where 
Qeo,rio,r]iw) and ReQ^r]o,ri{w) are as defined in the proof of Lemma 6.5 above. 
Note that P|(56»o,»?o,»?('"^)l can be written as 



^q{w)\w\ 



g{z)pgg{w,d=0\z)d{rj - rjo){z) 



dwP{d = 0) 



+ 



g{z)pgg{w, d=l\z) d{r] - r?o)(^) 



dwP{d=l), 



where Aj(u;) = J pQ^{w,d = i\z) dr]Q{z)/ J p0g{w,d = i\z) drj^z) for i = 0,l. 
Without loss of generality, we can show P|(56»o,r;o,r)('"^)l is of the order ||t7 — 
VoWbLi, provided the first integral on the right-hand side of the above equa- 
tion is of the same order. Based on the inequality \\w\ — \ao + aiz\ \ <\w — 
oq — a\z\ < \w\ + |qo + aiz\, we have Aq{w) < exp((Mi/2cr^)|tt;|). We can 
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verify that exjp{w'^ /4a'^)g{z)pgg{w,d = 0\z) G C\{Z) for P-almost all Y after 
proper rescaling. Also, /jg Ao(u))|w;|' exp(— 7i;^/4(T^) is trivially bounded. 
Thus, P\QQ^^^^.f^{w)\ is of the same order as ||ry — tjoWbLi - By similar analy- 
sis, we can also show that P|-Reo,r;o,f?(^)l = Op(||77 — r?o||)- Since lt,e{i-, r?) is 
similar to l{t, 6, t]), we also have P£t^g{6o, do, v) ~ P^t,ei(^o, Go, Vo) = Op{\\ri — 
VoWbLi)- □ 

Lemma 6.7. (5) holds when r] is in some neighborhood ofijo. 

Proof. Based on previous discussions about the verification of (5), we 
only need to show that |£(6'o, 6*0, ?7) - i{0o,6o,r]o)\ < C{y)\\i] - tjoWbLi and 
I lik(6lo, rj) - lik(6'o, ryo) - Ao(ry - ryo) lik{9o,r]o) I < -D(y) ||?7 - ??o , where C{y) 
and D{y) are both bounded in L2{P). The former inequality is easily proved 
via techniques similar to those used in the proof of Lemma 6.6. For the latter, 
we can write 



lik(0o,??)-lik(0o,r/o) 

= ^o(??-?/o)lik(6lo,%) +Peo(y|^) / d{7] - ijo) Pooivlz) d{r] - r]o), 

where Ao{r] - = Vo{zc}~^ J{,^} d{v - Vo) + {J Peo{y\z) dm{z)y^ x 
IPeo{y\z)d{r]-r]o). It is now easy to show that \peo{y\z) J d{r] - rjo) x 
I Peo{y\z) d{r] - r]o)\ < D{y)\\r] - r]o\\%^^ since pe^iylz) € Cl{Z) for P-almost 
every Y via rescaling. □ 

Proof of Lemma 7. The proof is analogous to that of Lemma 3 in [17]. 

□ 
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